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(57) Abstract 

Method of obtaining a library of tags able to define a specific state of a biological sample, comprising the following 
j successive steps: (1) extracting in a single-step mRNA from a small amount of a biological sample using oligo(dT)<j,25 covalently 
i bound to paramagnetic beads, (2) generating a double strand cDNA library, from said mRNA, (3) cleaving the obtained cDN As using 
j Sau3A I, (4) separating the cleaved cDNAs in two aliquots, (5) ligating the cDNA contained in each of said two aliquots via said 
[Sau3A i restriction site to a linker consisting of one double-strand cDNA molecule having one of the following formulas: 
jGATCGTCCC-X^I or GATCGTCCC-X^, wherein X^1 and Xi2, which comprise 30-37 nucleotides and are different, include a 20- 
1 25 bp PCR priming site with a Tm of 55°C-65°C, (6) digesting the products obtained in step (5) with the tagging enzyme BsmF I, (7) 
I blunt-ending said BsmF I tags with a DNA polymerase and mixing the tags ligated with the different linkers, (8) ligating the tags 
j obtained in step (7) to form ditags with a DNA ligase, (9) amplifying the ditags obtained in step (8) with primers comprising 20- 
j 25 bp and having a Tm of 55°-65°C, (10) isolating the ditags having between 20 and 28 bp from the amplification products obtained 
sin step (9) by digesting said amplification products with Sau3A I and separating the digested products, (11) ligating the ditags 
I obtained in step (10) to form concatemers, purifiying said concatemers and separating the concatemers having more than 300 bp, 
(12) cloning and sequencing said concatemers and (13) analysing the different obtained tags. 

(57) Abrege 

La presente invention concerne des procedes donnant une echantillotheque d'etiquettes capables de definir un etat specifique 
d'un echantiilon biologique. Ce procede comporte la suite suivante d'operations. (1) Extraction en une seule operation de I'ARNm 
d'une petite quantite d'un echantiilon biologique en utilisant un oligo(dT)<i,25 lie par colavence a des billes paramagnetques. (2) 
Generation, a partir de I'ARNm, d'une echantillotheque d'ADNc a double brin. (3) Clivage, par Sau3A I, des ADNc obtenus. (4) 
Separation des ADNc en deux parties aliquotes. (5) Ligation, sur un lieur, de I'ADNc contenu dans chacune de ces deux parties 
aliquotes, via le site de restriction de ces parties aliquotes, lequel lieur est constitue d'une molecule d'ADNc a double brin 
comprenant Tune des deux formes GATCGTCCC-X^I et GATCGTCCC-X62, formes dans lesquelles X^l et X^2, qui comprennent 
30 a 37 nucleotides et qui sont differents, incluent un site d'amorce de PCR a 20-25 bp avec une Tm de 55°C-65°C. (6) Digestion, 
avec I'enzyme d'etiquetage BsmF I, des produits resultant de (5). (7) Utilisation d'une polymerase d'ADN de facon a fournir aux 
etiquettes BsmF I des extremites franches, et melange des etiquettes presentant des ligations avec les differents lieurs. (8) 
Ligation des etiquettes issues de (7) de facon a former des di-etiquettes avec une ligase d'ADN. (9) Utilisation d'amorces pour 
amplifier les di-etiquettes de (8), lesquelles amorces comprennent 20-25 bp et presente une Tm de 55°-65°C. (10) Digestion des di- 
etiquettes entre 20 et 28 bp pour les separer des aux produits d'amplification issus de (9), laquelle digestion des produits 
d'amplification se fait avec Sau3A I et separation des produits digeres. (11) Ligation des di-etiquettes de (10) donnant des 
concatemeres, puis purification de ces concatemeres et separation des concatemeres de plus de 300 bp. (12) Clonage et 
sequengage de ces concatemeres et (13) analyse des differentes etiquettes obtenues. 
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(54) Title: MICROASSAY FOR SERIAL ANALYSIS OF GENE EXPRESSION AND APPLICATIONS THEREOF 
(57) Abstract 

Method of obtaining a library of tags able to define a specific state of a biological sample, comprising the following successive steps: 

( 1 ) extracting in a single-step mRNA from a small amount of a biological sample using oligo(dT)25 covalently bound to paramagnetic beads, 

(2) generating a double strand cDNA library, from said mRNA, (3) cleaving the obtained cDNAs using Sau3A I, (4) separating the cleaved 
cDNAs in two aliquots, (5) ligating the cDNA contained in each of said two aliquots via said Sau3A I restriction site to a linker consisting 
of one double-strand cDNA molecule having one of the following formulas: GATCGTCCC-Xi or G ATCGTCCC-X 2, wherein X] and 
X2, which comprise 30-37 nucleotides and are different, include a 20-25 bp PCR priming site with a Tni of 55°C-65 C C, (6) digesting the 
products obtained in step (5) with the tagging enzyme BsmF I, (7) blunt-ending said BsmF I tags with a DNA polymerase and mixing the 
tags ligated with the different linkers, (8) ligating the tags obtained in step (7) to form ditags with a DNA ligase, (9) amplifying the ditags 
obtained in step (8) with primers comprising 20-25 bp and having a Tm of 55°-65 6 C, (10) isolating the ditags having between 20 and 28 
bp from the amplification products obtained in step (9) by digesting said amplification products with Sau3A I and separating the digested 
products, (11) ligating the ditags obtained in step (10) to form concatemers, purifiying said concatemers and separating the concnttmers 
having more than 30O bp, (12) cloning and sequencing said concatemers and (13) analysing the different obtained tags. 
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Microassay for serial analysis of gene expression and applications thereof 

Several methods are now available for monitoring gene expression 
on a genomic scale. These include DNA microarrays (1,2) and macroarrays (3, 4), 
expressed sequence tag (EST) determination (5. 6), and serial analysis of gene expres- 
5 sion (7). Such methods have been designed, and are still used, for analysing 
macroamounts of biological material (1-5 [ig of poly(A) mRNAs, Le. ~10 7 cells). 
However, mammalian tissues consist of several different cell types with specific 
physiological functions and gene expression patterns. Obviously, this makes intricate 
the interpretation of large scale expression data in higher organisms. It is therefore 

10 most desirable to set out methods suitable for the analysis of defined cell populations. 

SAGE has been shown to provide rapid and detailed information on 
transcript abundance and diversity (7-10). It involves several steps for rnRNA purifi- 
cation, cDNA tags generation and isolation, and PCR amplification. We reasoned that 
increasing the yield of the various extraction procedures, together with slight modifi- 

15 cations in the number of PCR cycles could enlarge SAGE potentiality. Here we 
present a microadaptation of SAGE, referred to as SADE (II) since, in contrast to the 
original method, it allows to provide quantitative gene expression data on a small 
number (30,000-50,000) of cells. 

SAGE was first described by Velculescu et al. in 1995 (US Patent 

20 5 695 937 and 7), and rests on 3 principles which have now been all corroborated 
experimentally: a) short nucleotide sequence tags (10 bp) are long enough to be 
specific of a transcript, especially if they are isolated from a defined portion of each 
transcript; b) concatenation of several tags within a single DNA molecule greatly 
increases the throughput of data acquisition; c) the quantitative recovery of transcript- 

25 specific tags allows to establish representative gene expression profiles. 

However, said method was designed to study macroamounts of 
biological materials (5 u.g of poly(A) RNAs, i.e. about 10 7 cells). Since mammalian 
tissues consist of several different cell types with specific physiological functions and 
gene expression patterns, it is most desirable to scale down the SAGE approach for 

30 studying well delineated tissue fragments or isolated ceil populations. 

The inventors have now found a new method able to handle 
microamounts of samples. 
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The subject of the present invention is a method of obtaining a 

library of tags able to define a specific state of a biological sample, such as a tissue or 

a cell culture, characterised in that it comprises the following successive steps: 

10 (1) extracting in a single-step mRNA from a small amount of a 

5 biological sample using oligo(dT)25 covalently bound to paramagnetic beads, 

(2) generating a double-strand cDNA library, from said mRNA 
according to the following steps: 

15 

* synthesising the 1 st strand of said cDNA by reverse transcription of 
said mRNA template into a I s ' complementary single-strand cDNA, using a reverse 
1 0 transcriptase lacking Rnase H activity, 
20 * synthesising the 2 nd strand of said cDNA by nick translation of the 

mRNA, in the mRNA-cDNA hybrid form by an E. coli DNA polymerase, 

(3) cleaving the obtained cDNAs using the restriction endonuclease 
Sau3 A 1 as anchoring enzyme, 

25 

15 (4) separating the cleaved cDNAs in two aliquots, 

(5) ligating the cDNA contained in each of said two aliquots via said 
Sau3A I restriction site to a linker consisting of one double-strand cDNA molecule 

30 having one of the following formulas: 

GATCGTCCC-X] or GATCGTCCC-X 2 , 
20 wherein Xj and X?, which comprise 30-37 nucleotides and are 

different, include a 20-25 bp PGR priming site with a Tm of 55°C-65°C, and 
35 wherein GATCGTCCC (SEQ ID NO:l) correspond to a Sau3A I 

restriction site joined to a BsmF I restriction site, 

(6) digesting the products obtained in step (5) with the tagging 
40 25 enzyme BsmF I and releasing linkers with anchored short piece of cDNA 

corresponding to a transcript-specific tag, said digestion generating BsmF I tags 
specific of the initial mRNA, 

(7) blunt-ending said BsmF I tags with a DNA polymerase, prefera- 
45 bly T7 DNA polymerase or Vent polymerase and mixing the tags ligated with the 

30 different linkers, 

(8) ligating the tags obtained in step (7) to form ditags with a DNA 

ligase, 

50 
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(9) amplifying the ditags obtained in step (8) with primers 
comprising 20-25 bp and having a Tm of 55°-65°C, 

(10) isolating the ditags having between 20 and 28 bp from the 
10 amplification products obtained in step (9) by digesting said amplification products 

5 with the anchoring enzyme Sau3A I and separating the digested products on an 
appropriate gel electrophoresis, 

(11) ligating the ditags obtained in step (10) to form concatemers, 
purifiying said concatemers and separating the concatemers having more than 300 bp, 

(12) cloning and sequencing said concatemers and 
10 (13) analysing the different obtained tags. 

20 Libraries of tags in the sense of the invention comprise at least two 

tags, each of them defining at least one gene and potentially corresponding to a new 
gene. 

According to an advantageous embodiment of said method, in step 
15 (2), said synthesis of the 1 st strand of said cDNA is performed with Moloney Murine 
Leukaemia Virus reverse transcriptase (M-MLV RT), and oligo(dT) 2 s as primers. 

According to another advantageous embodiment of said method, the 
linkers of step (5) are preferably hybrid DNA molecules formed from linkers 1A and 
IB or from linkers 2A and 2B, having the following formulas: 
20 linker 1A: S'-TTTTGCCAGGTCACTCAAGTCGGTCATTCATGTCAGCACAGG GAC-3 T 
(SEQ ID NO:2) 

35 linker IB: 5'-GATCGTCCCTGTGCTGACATGAATGACCGACTTGAGTGACCTGGCA- 

3* (SEQ ID NO:3) 
or 

25 linker 2A: 5 ? -TTTTTGCTCAGGCTCAAGGCTCGTCTAATCACAGTCGGAAGGGAC-3* 
(SEQ IDNO:4) 

linker 2B: 5'-GATCGTCCCTTCCGACTGTGATTAGACGAGCCTTGAGCCTGAGCAA- 
3' (SEQ ID NO:5). 

45 According to another advantageous embodiment of said method, the 

30 amount of each linker in step (5) is at most of 8-10 pmol and preferably comprised 
between 0.5 pmol and 8 pmol for initial amounts of respectively 10-40 ng of mRNAs 
and 5 jig of mRNAs. 

50 
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According to yet another advantageous embodiment of said method, 
the primers of step (9) have preferably the following formulas: 

5 *-GCCAGGTCACTCAAGTCGGTCATT-3 ' (SEQ ID NO:6) 
10 S'-TGCTCAGGCTCAAGGCTCGTCTAO' (SEQ ID NO:7). 

5 According to yet another advantageous embodiment of said method, 

the biological sample of step (I) preferably, comprises < 5.10 6 cells, corresponding to 
15 at most 50 u-g of total RNA or 1 ug of poly(A) RNA. 

According to the invention, biological sample means for instance : 
tissue, cells (native or cultured cells), which are iysed for extracting mRNA. 
10 According to another advantageous embodiment of said method, 

20 said tissue sample is from kidney, more specifically from nephron segments 

corresponding to about 15,000 to 45,000 cells, corresponding to 0.15-0.45 u.g of total 
RNA. 

25 The subject of the present invention is also the use of a library of 

1 5 tags obtained according to the method as defined above, for assessing the state of a 
biological sample, such as a tissue or a cell culture. 

The subject of the present is also the use of the tags obtained 
30 according to the method as defined here above as probes. 

The subject of the present invention is also a method of 
20 determination of a gene expression profile, characterised in that it comprises : 
. performing steps (1) to (13) according to claim t and 

35 

. translating cDNA tag abundance in gene expression profile. 
According to a preferred embodiment of said method the gene 
expression profile obtained in mouse outer medullary collecting duct (OMCD) and in 
40 25 mouse medullary thick ascending limb (MTAL) is as specified in Table I below: 
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35 The invention also relates to a kit useful for detection of gene 

expression profile, characterised in that the presence of a cDNA tag, obtained from the 
5 mRNA extracted from a biological sample, is indicative of expression of a gene 

having said tag sequence at an appropriate position, i.e. immediately adjacent to the 
40 most 3' Sau3A I site in said cDNA, obtained form said mRNA, the kit comprising 

further to usual buffers for cDNA synthesis, restriction enzyme digestion, ligation and 

amplification, 

4 ^ 10 - containers containing a linker consisting of one double-strand 

cDNA molecule having one of the following formulas: 

GATCGTCCC-X, or GATCGTCCC-X 2 , 

wherein X| and X2, which comprise 30-37 nucleotides and are 
50 different, include a 20-25 bp PCR priming site with a Tm of 55°C-65°C, and 
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wherein GATCGTCCC (SEQ ID N0:1) correspond to a Sau3A I 
restriction site joined to a BsmF I restriction site, and 

- containers containing primers comprising 20-25 bp and having a 

10 Tmof 55°-65°C. 

5 According to an advantageous embodiment of said kit, it preferably 

contains 

15 - containers containing hybrid DNA molecules formed from linkers 

1 A and IB or from linkers 2A and 2B, having the following formulas: 
linker IA: 5 , -TTTTGCCAGGTCACTCAAGTCGGTCATTCATGTCAGCACAGGGAC-3 , 
10 (SEQIDNO:2), 

20 linker IB: 5'-GATCGTCCCTGTGCTGACATGAATGACCGACTTGAGTGACCTGGCA- 

3' (SEQ ID NO:3), or 

linker 2A: 5 '-TTTTTGCTCAGGCTC AAGGCTCGTCTAATCACAGTCGGA AGGG AC-3 * 
25 (SEQIDNO:4) 

15 linker 2B: S^GATCGTCCCTTCCGACTGTGATTAGACGAGCCTTGAGCCTGAGCAA- 
3' (SEQ IDNO:5),and 

- containers containing the following primers: 

30 5 '-GCC AGGTC ACTC AAGTCGGTC ATT-3 * (SEQ ID NO:6) 

5'-TGCTCAGGCTCAAGGCTCGTCTA-3' (SEQ IDNO:7). 
20 As compared to SAGE, the instant SADE method includes the 

following features: 1) single-step mRNA purification from tissue lysate; 2) use of a 
reverse transcriptase lacking Rnase H activity; 3) use of a different anchoring enzyme; 
4) modification of procedures for blunt-ending cDNA tags; 5) design of new linkers 
and PCR primers. 

40 25 Figure 1, modified from the original studies of Velculescu et ai., 

summarises the different steps of the SADE method, which is a microadaptation of 
SAGE. Briefly, as already specified here above, mRNAs are extracted using 
oligo(dT) 2 5 covalently bound to paramagnetic beads. Double strand cDNA is synthe- 

45 sised from mRNA using oligo(dT) 2 $ as primer for the 1st strand synthesis. The cDNA 

30 is then cleaved using a restriction endonuclease (anchoring enzyme: SAU3A I) with a 
4-bp recognition site. Since such an enzyme cleaves DNA molecules every 256 bp 

5Q (4 4 ) on average, virtually ail cDNAs are predicted to be cleaved at least once. The 3' 
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25 



1 

end of each cDNA is isolated using the property of the paramagnetic beads and 
divided in half. Each of the two aliquots is ligated via the anchoring enzyme restric- 
tion site to one of the two linkers containing a type IIS recognition site {tagging 
enzyme: BsmF I) and a priming site for PCR amplification. Type ITS restriction endo- 
5 nucleases display recognition and cleavage sites separated by a defined length (14 bp 
for BsmF I), irrespective of the intercalated sequence. Digestion with the type lis 
restriction enzyme thus releases linkers with an anchored short piece of cDNA, 
corresponding to a transcript-specific tag. After blunt ending of tags, the two aliquots 
are linked together and amplified by PCR. Since ail targets are of the same length (110 
10 bp) and are amplified with the same primers, potential distortions introduced by PCR 
20 are greatly reduced. Furthermore, these distortions can be evaluated, and the data 

corrected accordingly (7, 8). Ditags present in the PCR products are recovered through 
digestion with the anchoring enzyme and gel purification, then concatenated and 
cloned. 

15 In the SAGE method, mRNAs are isolated using conventional 

methods, then hybridized to biotinylated oligo(dT) for cDNA synthesis. After 
cleavage with the anchoring enzyme Nla III, the biotionylated cDNA fraction (3 J end) 
jo is purified by binding to streptavidin beads. 

In the SADE method, mRNAs are directly isolated from the tissue 
20 lysate through hybridization to oligo(dT) covalently bound to magnetic beads. Then, 
all steps of the experiment (until step 3 of protocol 5, as described here after) are 
performed on magnetic beads. This procedure saves time for the initial part of the 
experiment and, more importantly, provides better recovery. Quantitative analysis of 
the cDNA amounts available for library construction revealed dramatic differences 
25 between SAGE and SADE. With the SAGE method, starting from 500 mg tissue, 
1.7 u-g of cDNA are obtained, and only 4 ng were able to bind to streptavidin beads 
after Sau3A I digestion. With the SADE method, starting from 250 mg of tissue, 
3.2 ng of cDNA were synthesised on beads, and 0.5 fig remained bound after Sau3A I 
45 cleavage. The increased yield of SADE (X250) explains the success in constructing 

30 libraries from as few as 30,000 cells. Using different sources of oligo(dT) leaded to 
poor cDNA recoveries. This may be explained by the fact that the binding capacity of 
streptavidin beads can be altered by several parameters, such as the presence of 

50 
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phenol, the length and composition of the biotinylated DNA fragments, and the length 
of the spacer between the oligo(dT) and the biotin molecule. 

Another important difference between SAGE and SADE concerns 
10 the selected anchoring enzyme. Although any restriction enzyme with a 4-bp restric- 

5 tion site could serve as anchoring enzyme, Sau3A I was preferred to Nla III (7-10) or 
other enzymes in our studies. Several cDNA libraries used for large scale sequencing 
are constructed by vector priming, followed by cDNA cleavage with Mbo I (an 

15 

isoschizoraer of Sau3A I which does not cut the vector (methylated) DNA), and 
circularisation (6). SADE tags therefore correspond to the cDNA 5' ends of these 
10 libraries, which enables to use more efficiently EST data bases to analyse the data. 
20 In addition to the preceding arrangements, the invention further 

comprises other arrangements, which will emerge from the description which follows, 
which refers to examples for carrying out the process which is the subject of the 
present invention as well as to the accompanying drawings, in which: 

25 

15 - Figure 1; Outline of procedures for constructing SADE libraries. 

Poly(A) RNAs are isolated from tissue lysate using oligo(dT) 25 covalently linked to 
paramagnetic beads, and cDNA is synthesised under solid-phase condition. Bold face 
30 characters correspond to biologically relevant sequences, whereas light characters 

represent linker-derived sequences. The anchoring enzyme (AE) is Sau3A I, whereas 
20 the tagging enzyme (TE) is BsmF I. See text for details. 

- Figure 2: Gel analysis of cDNAs synthesised from different 

35 

amounts of tissue. Poly(A) RNAs were isolated from the indicated amounts of mouse 
kidney, and cDNAs were synthesised and Sau3A 1-digested on paramagnetic beads 
(see protocols 1-2). cDNAs released from beads were recovered, and half of the mate- 
40 25 rial obtained from each reaction was analysed on a 1% agarose gel stained with 

ethidium bromide. Position of molecular weight markers are indicated in bp: left, X 
Bste II-digest; right, pBR Msp I-digest. 

- Figure 3: PCR amplification of ditags. Poly(A) RNAs were 
45 isolated from 50 or 150 mm of microdissected nephron segments (corresponding to 

30 about 15,000 and 45,000 cells, respectively). The corresponding ditags were amplified 
by PCR using the indicated number of cycles and analysed on a 3% agarose gel 

50 
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stained with ethidium bromide. The expected product (linkers + I ditag) is 110-bp 
long. Molecular weight marker (M) is 10-bp DNA ladder (Life Technologies). 

- Figure 4: Gel analysis of concatemers. Ditags were concatenated 
10 by ligation (2 h at 16°C), then electrophoresed through an 8% polyacrylamide gel. The 

5 gel was post-stained using SYBR Green I, and visualised by UV illumination at 305 
run. Migration of the molecular weight marker (k BstE II-digest) is indicated on the 
15 ri g ht - 

- Figure 5: Comparison of gene expression levels in two nephron 
portions of the mouse kidney. SADE libraries were constructed from -50,000 cells 

10 isolated by microdissection from medullary collecting ducts or medullary thick 
20 ascending limbs, and 5,000 tags were sequenced in each case. The data show the 18 

most abundant collecting duct tags originating from nuclear transcripts (mitochondrial 
tags were excluded from the analysis), and their corresponding abundance in the thick 
ascending limb library. 
15 It should be understood, however, that these examples are given 

solely by way of illustration of the subject of the invention and do not in any way 
constitute a limitation thereto. 
30 Example: 

1. Tissue sampling and rnRNA isolation 
20 1.1 Tissue sampling and lysis 
35 The initial steps of library construction require the usual precautions 

recommended for experiments carried out with RNAs (12). In addition, since library 
construction involves large scale PCR (Protocol 6), care must be taken to avoid 
contamination from previous libraries. Working under PCR grade conditions is espe- 

40 

25 dally important when low amounts of tissue or cells are used. 

Starting from whole tissues (i.e. kidney, liver, brain, ...), the 
following procedures may be routinely used. After animal anaesthesia or decapitation, 
45 the tissue is removed as quickly as possible, rapidly rinsed in ice-cold phosphate- 

buffered saline, sliced in ~50 mg-pieces, and frozen in liquid nitrogen. The frozen 
30 sample is then ground to a fine powder under liquid nitrogen using a mortar and a 
pestle, transferred into lysis binding buffer (protocol 7), and homogenised with a 
50 Dounce tissue disrupter. To avoid loss of material, small samples (<20 mg) can be 
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10 

5 

transferred without previous freezing in the lysis binding buffer, and homogenised in a 
1 ml Dounce. The respective amounts of tissue and lysis binding buffer needed for a 
variety of conditions are indicated in Table II. 
10 Table II. Small and large scale mRNA isolation and cDNA 

5 synthesis 



15 



25 



35 



Reaction volume (uJ) 



Tissue/ cells 


Lysis binding buffer (ml) 


Oligo(dT) beads (ul) 


1 st strand 


250 mg/3xl0 7 


5.50 


600 


50 


30mg/3xl0 6 -6xl0 6 


0.70 


100 


50 


4mg/10 5 -10 6 


0.10 


30 


25 


0.5 mg/3xlO 4 -10 5 


0.05-0.10 


20 


25 



400 
400 

20 4 m£/10M0 6 0,10 30 25 200 

200 



Starting from isolated or cultured cells, the procedure is much more 
rapid. The cell suspension, maintained in appropriate culture or survival medium, just 
needs to be centrifuged at 600-1 .200g for 5 min. After supernatant removal, the lysis 
10 binding buffer is added onto the cell pellet, and the sample is homogenised by 
30 vortexing. This procedure has been successfully applied to 3X10 4 -3X10 7 cells (Table 

II). 

1.2. mRNA isolation. 

Protocols 1-7 describe the generation of a SADE library from 0.5 
15 mg of tissue. The amount of cDNA recovered corresponds to an experiment carried 
out on the mouse kidney. Slightly different amounts are expected to be obtained from 
other tissues, according to their mRNA content. The procedures described herein have 
40 been repeatedly used without modifications with 3xl0 4 -10 3 isolated ceils. Since some 

applications can be performed on large amounts of tissue or cells, protocol adaptations 
20 and anticipated results for these kinds of experiments are also provided. 

In the initial experiments, RNAs were extracted using standard 

45 

methods (13), and poly(A) RNAs were isolated on oligo(dT) columns. Besides being 
time consuming, this procedure provides low and variable mRNA amounts, and 
cannot be easily scaled down. The alternative procedure described here (use of 
50 25 oligo(dT) 25 covalently linked to paramagnetic beads) is a single tube assay for mRNA 
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isolation from tissue iysate. In our hands, it yields 4-times higher mRNA amounts 
than standard methods. Kits and helpful instructions for mRNA isolation with 
oligo(dT) beads can be obtained from Dynal. Handling of these beads is relatively 
simple, but care must be taken to avoid centrifiigation, drying or freezing, since all 
5 three processes are expected to lower their binding capacity. On the other hand, beads 
can be resuspended by gentle vortexing or pipetting without extreme precautions. 

Protocol 1. mRNA purification 

Equipment and reagents 

. Appropriate tissue or cells. 
10 . Dynabead mRNA direct kit (Dynal, ref. 6 10- 11) containing Dynabeads oligo (dTfes, 
lysis binding buffer, and washing buffers. 

. 5X reverse transcription (RT) buffer (250 mM Tris-HCl (pH8.3), 375 mM KC1, 15 
mM MgCh), provided with cDNA synthesis kit (see protocol 2). 

. Magnetic Particle Concentrator (MPC) for 1 .5 ml tubes (Dynal, ref. 12004). 
1 5 . Glycogen for molecular biology (Boehringer Mannheim, ref. 90 1393). 

Method 

1. Lyse the tissue sample in 100 ul lysis binding buffer supplemented with 10 ug 
glycogen. 

2. Add 20 ul of Dynabeads in a 1.5 ml tube and condition them according to manu- 
20 facturer's instructions. 

3. Using the MPC, remove the supernatant from the Dynabeads and add the tissue 
lysate (100 ul). Mix by vortexing and anneal mRNAs to the beads by incubating 10 
min at room temperature. 

4. Place the tube 2 to 5 min in the MPC and remove the supernatant. The mRNAs are 
25 fixed on the beads. 

5. Using the MPC, perform the following washes (all buffers contain 20 ug/ ml glyco- 
gen): twice with 200 ul washing buffer containing lithium dodecyl sulfate (LiDS), 
3-times with 200 uJ washing buffer, and twice with 200 ul ice-cold IX RT buffer. 



25 



WO 00/44936 PCT/IBOO/O01 1 1 

Resuspend the beads by pipetting, transfer the suspension in a fresh 1 .5 ml tube, 
wash once with 200 ul ice-cold IX RT buffer and immediately proceed to protocol 
10 2. mRNAs on the beads are now ready for 1st strand cDNA synthesis. 

5 1.3 mRNA integrity and purity 

^ Before generating a cDNA library, it is generally advised to check 

for mRNA integrity by Northern blot analysis. However, this control experiments 
consumes part of the material, takes several days, and often leads to ambiguous results 
(a variety of reasons can cause poor Northern hybridisation signals). In addition, it is 
20 10 no longer possible when using small amounts of tissue or cells. RNA degradation has 

only to be expected in the three following conditions: 1) cell survival is not 
maintained before lysis or freezing; 2) cell thawing outside of lysis buffer, and 3) use 
of poor quality reagents. Since Rnase-free reagents are now available from a variety of 
company, it is much more rapid and effective to check for survival {i.e. select the 
15 appropriate culture medium) and freezing conditions than to perform tricky tests on 
RNA aliquots. 

30 The purity of mRNAs isolated with oligo(dT) beads is better than 

that obtained with conventional methods. When we generated SAGE libraries using 
mRNAs extracted with guanidinium thiocyanate and oligo(dT) columns, nuclear 
20 encoded rRNAs amounted to 1% of the sequenced tags. Using the alternative mRNA 

35 

extraction procedure, rRNAs tags are no longer present in the library. 
2. 1st and 2nd strand synthesis 
2.1. 1st strand cDNA synthesis 
40 The first step in the synthesis of cDNA is copying the mRNA 

25 template into complementary single-strand cDNA. In protocol 2, 1st strand cDNA is 
synthesised using Moloney Murine Leukaemia Virus RT (M-MLV RT). With this 
enzyme, we have been able to generate SADE libraries from either large or minute 
amounts of cells (Table 1). In our last series of experiments, we have however used 
Superscript II M-MLV RT, provided with the Superscript cDNA synthesis kit (Life 
30 Technologies, ref. 18090-019). In this case, the amount of cDNA formed (see 2.3) was 
50 increased -4-fold. Although this better yield likely results from both the synthesis of 
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longer cDNAs (which is not essential for the current application) and of a higher 
number of cDNA molecules, we strongly recommend to use Superscript II M-MLV 
R.T for very small samples (<5 0,000 cells). The protocol will be similar to the one 
described here, except for reaction volumes (20 ul for 1st strand synthesis, and 150 jjl 
5 for 2nd strand synthesis). 

mRNAs are generally heated 5 min at 65°C before reverse transcrip- 
tion to break up secondary' structures. Since such a high temperature will also denature 
the mRNA-oligo(dT)25 hybrid, we only heat the sample at 42°C before initiation of 1st 
strand synthesis. 

10 2.2. 2nd strand synthesis 

Many procedures have been developed for 2nd strand cDNA synthe- 
sis. The method used here is a modification of the Grubler and Hoffman procedure. 
Briefly, the mRNA (in the mRKA-cDNA hybrid) is nicked by K coli RNAse H. £ 
coli DNA polymerase initiates the second strand synthesis by nick translation. E. coli 

15 DNA ligase seals any breaks left in the second strand cDNA. The procedure is 
described in protocol 2. This step is usually very efficient (approximately 100%) so 
that a 2 h-incubation period is sufficient when starting from macroamounts of material 
(>100 rng of tissue or 10 7 cells). 



20 Protocol 2. cDNA synthesis and cleavage 

35 

Equipment and reagents 

. cDNA synthesis kit (Life Technologies, ref. 18267-013) contains ail buffers and 
enzymes necessary for first and second strand cDNA synthesis. 

40 

. a[ 32 P]dCTP 6000Ci/mmol (Amersham, ref. AA0075). 
25 . TEN (10 raM Tris-HCi (pH8.0), 1 mM EDTA, 1 M NaCl). 

. Restriction endonuclease Sau3A I 4 U/ul (New England Biolabs, ref. 169L), 
45 provided with 10X reaction buffer and purified 100X bovine serumalbumin (BSA, 

10mg/ ml). 

. Magnetic Particle Concentrator MPC (Dynal). 
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. Geiger counter. 

. Automated thermal cycler or water-baths equilibrated at 42°C, 37°C, and 16°C. 
Method 

1 . Resuspend the beads in 12.5 ul of 1 X first strand (i.e. RT) reaction buffer. 
5 2. Incubate 2 min at 42°C. ' 

3. Place the tube at 37°C for 2 min. Add 12.5 ul of the following mix : 5 ixl DEPC- 
treated water, 2.5 ul 5X first strand buffer, 1.25 ul dNTP 10 mM, 2.5 ul DTT 
100 mM, 1 .25 ul MMLV reverse transcriptase. 
3. Incubate 1 h at 37°C and chill on ice. 
10 4. On ice, prepare the following mix : 169.7 \x\ DEPC-treated water, 4.5 ul dNTP 
10 mM, 24 pi 2nd strand buffer, 2 ul a[ 32 P]dCTP, 6 ul E. coli DNA polymerase I, 
1.05 \x\ E. coli RNAse H, 0.75 ui E. coli DNA ligase, 2 ul glycogen 5 ug/ ul. 

5. Add 175 p.1 to the first strand tube and incubate overnight at 16°C. Keep the 
remaining mix for subsequent measurement of its radioactivity and calculation of 

1 5 dCTP specific activity. 

6. Wash beads to remove non incorporated a[ 32 P]dCTP: 4-times with 200 ^1 TEN + 
BSA 3 , and 3-times with 200 ul ice-cold IX mix Sau3A I + BSA a . Check with 
Geiger counter that the last eluate is not radioactive, whereas the material bound on 
the beads is highly radioactive. 

20 7. Add on the beads the following mix : 88 ul H2O, 10 ul 10X mix Sau3A I, I ul 
100X BSA, 1 jil Sau3A I. Incubate 2h at 37°C, Vortex intermittently. 

8. Chill 5 min on ice. 

9. Using the MPC, remove the supernatant, which contains the 5' end of the cDNA. 

a 

Wash once with 200 ul of IX mix Sau3A I + BSA . Remove this second super- 
25 natant, pool it with the first one, and store the resulting solution (300 u0 in order to 
measure the yield of second strand synthesis (see text). Before going to step 10, 
check with Geiger counter that both the eluate and beads are radioactive. 
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10, Resuspend the beads in 200 ul TEN supplemented with BSA 3 . 
a 

Final concentration of BSA: 0.1 mg/ ml. 

10 

2.3. Yield of 2nd strand synthesis 

5 A method to calculate the yield for first and second strand cDNA 

synthesis is given in the cDNA synthesis kit instruction manual. We do not measure 

15 

the yield of 1st strand cDNA synthesis since, as discussed above (1.3), this implies to 
set away part of the preparation. 

The amount of double strand (ds) cDNA formed is calculated by 
20 10 measuring radioactivity incorporation in the 5' end of the cDNA, which is released in 

the supernatant after Sau3A I digestion (see Protocol 2). The 300 ul-supernatant is 
extracted with PCI and the ds cDNA is ethanol precipitated in the presence of glyco- 
gen (50 ug/ ml) and 2.5 M ammonium acetate. The pellet is resuspended in 8 ^1 of 
25 TE. Half of the material is used for liquid scintillation counting, and the remaining is 

15 loaded on a 1.0 or 1.5% agarose gel. For experiments carried out on 250, 30, 4, and 
0.5 mg of mouse kidney, we obtained the following amounts (ug) of ds cDNA: 2.8, 
3Q 0.3, 0.05, and 0.01. The higher amount corresponds to the incorporation of 1.3% of 

the input radioactivity. In these experiments, three of the four cDNA samples could be 
detected by ethidium bromide staining after gel electrophoresis (Fig. 2). Their size 
20 ranged between <0.2 and -3 kbp (the small size of most cDNA fragments is due to 
35 Sau3A I digestion). When cDNA amounts are below the detection threshold of the 

ethidium bromide staining method, autoradiographic analysis can be performed. In 
this case, the gel is fixed in 10% acetic acid, vacuum dried and exposed overnight at - 
80° C with one intensifying screen for autoradiography. 
25 3. Linkers design, preparation, and ligation 
3*1. Linkers design 

A variety of linkers can be used at this point. Linkers must contain 
three important sequences : a) the appropriate anchoring enzyme overhang; b) a 
recognition site for a type lis restriction enzyme (tagging enzyme); c) a priming site 
30 for PCR amplification. High quality linkers are crucial for successful library genera- 

50 tion - 
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5 

Table III provides the sequence of linkers and PCR primers used in 
our experiments. All four linkers must be obtained gel-purified. Linkers IB and 2B 
display two modifications: a) 5' end phosphorylation, and b) C7 amino modification 
10 on the 3' end. Linkers phosphorylation can be performed either enzymaticaliy with T4 

5 polynucleotide kinase, or chemically at the time of oligonucleotide synthesis. In both 
cases, phosphorylation efficiency must be tested {Protocol 3). We use chemically 
phosphorylated linkers. Linkers modification on the 3* end serves to increase the effi- 
ciency of ditag formation (protocol 5, step 8-11), Indeed, the modified 3' end cannot 
be blunt-ended and will not ligate to cDNA tags or linkers . 
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Table III. Sequence of linkers and PCR primers 

Oligonucleotide Sequence 

Linker 1 A SEQIDNO:2 

Linker IB* SEQIDNO:3 

Linker 2A SEQ ID NO:4 

Linker 2B* SEQIDNO:5 

Primer 1 SEQIDNO:6 

Primer 2 SEQ ID NO: 7 

a Linkers IB and 2B include two modifications (5 '-phosphorylation and 3 T -C7 amino 

modification). 



15 



With regard to the PCR priming site, it was designed with the help 
of Oligo™ software (Medprobe, Norway) in order to obtain PCR primers with high 
Tm (60°C), and avoid self-priming or sense/ antisense dimer formation. Two different 
priming sites must be designed in " left " and " right " linkers, otherwise the target 
45 20 will undergo panhandle formation, and thus escape PCR amplification. 
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Protocol 3. Preparing and testing linkers 
Equipment and reagents 
. Linkers 1A, IB, 2A, and 2B at 20 prnol/ ul. 
. Primers 1 and 2 at 20 prnol/ ul. 
5 .T4 DNA ligase 1 U/ ul (Life Technologies, ref. 1 5224-0 1 7) and 5X reaction buffer. 
. 10 mM ATP. 

. PCR reagents: Taq DNA polymerase 5 U/ ul (Eurobio), ] OX PCR buffer (200 mM 
Tris-HCl (pH 8.3), 15 mM MgCl 2 , 500 mM KC1, 1 mg/ ml gelatiu), 1.25 mM 
dNTP, 100 mM MgCl 2 , and 100 mM DTT. 
10 . Restriction endonucleases Sau3A 1 (4 U/ ul) and BsmF I (2 U/ ul) (New England 
Biolabs, refs. 169L and 572L), provided with 10X reaction buffer and 100X BSA. 
. 10-bp DNA ladder (Life Technologies, ref. 10821-015) 

. Automated thermal cycler and water baths equilibrated at 14°C, 37°C, and 65°C. 
. Tris-HCl buffered (pH 7.9) phenol-chloroform-isoamyl alcool (PCI). 
15 . 10 M ammonium acetate. 

. TE (10 rnM Tris-HCl (pH8.0), EDTA 1 mM). 
Method 

1. Mix 25 ul of linker 1 A and 25 ul of linker IB in a 0.5 ml PCR tube (final concen- 
tration: 10 prnol/ ul). Proceed similarly for linkers 2 A and 2B. 
20 2. Transfer PCR tubes in the thermal cycler. Heat at 95°C for 2 min, then let cool at 
room temperature for 20 min on the bench. Store at -20°C. 
3. Test self-ligation of each hybrid , as well as ligation of hybrid (1 A/IB) with hybrid 
(2A/2B). Set-up 3 ligation reactions by mixing 1 ul of hybrid (1A/ iB) (tube 1), 
1 ulofhybrid(2A/2B)(tube 2), 0.5 ul of hybrid (1 A/IB) and 0.5 ul of hybrid (2A/ 
25 2B) (tube 3) with 2 ul 10 mM ATP, 4 ul 5X ligase mix, 12 ul H20, and I ul T4 
DNA ligase. 
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5 

4. Incubate 2 h or overnight at 14°C. Analyse 10-ul aliquots on a 3% agarose gel 
using 10-bp DNA ladder as marker. Most of the material (> 80%) consists of a 94- 

10 bp DNA fragment. 

5. Proceed to PCR using 10 3 targets from tube 3 reaction (dilute using TE buffer 
5 supplemented with 0.1 mg/ ml BSA). Mix 1 ul of diluted ligation product, 5 ul 10X 

15 PCR butter, 1 ui 100 mM MgCl 2? 8 ul 1.25 mM dNTP, 2.5 ul primer 1, 2.5 ul 

primer 2, and 30 ul water. Prepare 4 such reactions and a control tube without 
linker, transfer in the thermal cycler and heat at 80°C for 2 min. 

2 0 6. Add in each tube 50 ul of Tag polymerase amplification mix (5 ul 1 OX PCR buffer, 

10 4 ul 100 mM DTT, 0.5 ul Tag polymerase, 40.5 ul water), and 60 ul of mineral oil 
if necessary for your thermal reactor. 

7. Perform 29 PCR cycles (95°C, 30 s; 58°C, 30 s; 70°C, 45 s) ; followed by an addi- 
tional cycle with a 5-min elongation time. 

8. Analyse 10-ul aliquots on a 3% agarose gel. A 90-bp amplification fragment is 
15 clearly visible. 

9. Pool all 4 PCR samples, extract with equal volume of PCI. Transfer the aqueous 
(upper) phase in a fresh tube, then add 100 ul 10 M ammonium acetate and 500 p.1 
isopropanol. Round the tubes several times for mixing, centrifuge (15,000g) at 4°C 
for 20 min, wash twice with 400 ul 75% ethanol, vacuum dry, and resuspend the 

20 pellet in 1 2 ul TE buffer. 

10. Set-up two 50-ul digestion reactions using 5 ul of DNA and 4 U of Sau3A 1 or 
BsmF I. Incubate 1 h at 37°C or 65°C, as appropriate. 

11. Analyse 10-ul aliquots on a 3% agarose gel. Run in parallel 1 ul of uncut PCR 
product. Sau3 A I and BsmF 1 digestion must be completed to > 80%. 

3.2, Linkers preparation 

It is essential to check that ds linkers can be ligated, PCR amplified, 
50 and digested with the anchoring and tagging enzyme. Success with protocol 3 experi- 
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ments is a prerequisite before attempting to prepare a library. The PCR conditions 
described here have been optimised for Hybaid thermal reactors (TR1 and Touch 
Down) working under control or simulated tube conditions. Different conditions may 
be used with other machines. Note that since the target is quite small (90 bp), elonga- 
5 tion is performed at a relatively low temperature. 
33. Ligating linkers to cDNA 
j 5 The concentration of ds linkers should be adapted to the amount of 

cDNA used to prepare the library. In the original protocol of Velculescu at ai y 2 \ig 
(74 pmol) of ds linkers are used. Considering that starting from 5 u.g of mRNAs, 2 jag 
10 of cDNAs with 1-2 kb average size are obtained, the amount of cDNA available for 
ligation is in the range of 1.5-3 pmol. Since a large excess of linkers decreases the 
PCR signal to noise ratio, we perform ligation with 8 pmol ds linkers for libraries 
generated from 250 mg of tissue (~5 jig mRNAs). Starting from 5X10 4 -10 5 cells (10- 
40 ng mRNAs), 0.5 pmol of ds linkers are used. A lower amount of linkers may allow 
1 5 efficient ligation, but we have no experience for it. 



20 



25 



30 Protocol 4. Ligating ds linkers to cDNA 

Equipment and Reagents 

. Hybrid (1A/1B) and (2A/2B) at 0.5 pmoi/ uJ, obtained from protocol J, steps 1-2. 
35 20 . TEN, TE. and LoTE (3 mM Tris-HCl (pH7.5), 0.2 mM EDTA), stored at 4 C C. 

. 10X NEB IV reaction buffer and 100XBSA (New England Biolabs). 
. T 4 DNA ligase 5 U/ pi (Life Technologies, ref. 15224-041) and 5X ligation mix; 10 
mM ATP. 

40 

. MPC (Dynal). 
25 . Water-baths equilibrated at 45°C and 16°C. 
. Geiger counter 

45 

Method 

1. Once the experiments described in protocol 2 have been carried out, perform 2 
additional washes of the beads before ligating ds linkers to the cDNA. Using the 



55 



WO 00/44936 PCT/IB00/0011 1 

5 20 

MPC, wash the beads with 200 ui of TEN + BSA 3 Resuspend the beads in 200 ul 
of the same buffer (take care to recover the beads completely: mix by repeated 
10 pipetting and scrape the tube wall with the pipette tip), then separate into two 100 

uJ-aliquots: one will be ligated to hybrid (I A/IB), the other will be ligated to 
5 hybrid (2A/2B). 

15 2. Add 10 ul of fresh Dynabeads in two 1.5 ml tubes. These tubes will be now treated 

as the two others and will be used as negative control. 
3. Wash twice the 4 tubes with 200 ul of ice-cold TE buffer + BSA 3 
20 4. Immediately after the last rinsing, add to each tube 34 ul of the appropriate mix 

10 containing 8 fiJ of 5X ligase buffer and 0.5 pmol of hybrid (1A/1B) or hybrid 
(2A/2B). Heat 5 min at 45°C then chill on ice. 
25 5. Add in each tube 4 ul of 10 mM ATP, and 2 ul of T4 DNA ligase (final volume: 40 

ul). Incubate overnight at 14°C. 

6. Wash beads thoroughly (free linkers will poison the PCR amplification) as follows: 
3Q 15 4-times with 200 ul of TEN + BSA* and 3-times with 200 ul of IX NEBIV + 

Q 

BSA . After the first rinsing with NEB IV, take care to resuspend completely the 
beads (see above) and transfer them to fresh tubes. After the last rinsing, check that 
35 radioactivity is still present on the beads, but absent from the supernatant. 

7. Proceed to protocol 5 or store at 4°C. 

20 Final concentration of BSA: 0.1 mgf mi. 

*° After ligation (step 5 in protocol 4), it is very important to wash the 

beads extensively in order to remove free ds linkers. In fact, if ds linkers not ligated to 
cDNA fragments are not thoroughly eliminated from each sample, the library will 
25 contain large amounts (up to 25%) of linkers sequences. This will make data acquisi- 
tion poorly efficient. 
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4. Ditags formation 
4.1. Release of cDNA tags 

Digestion with the tagging enzyme (BsmF I) wili release only small 
DNA fragments from oligo(dT) beads. Consequently, much of the radioactivity 
5 remains bound to the beads at this stage. In order to check that extensive rinsing did 
not cause great loss of material, we usually measure beads radioactivity by Cerenkov 
15 counting (the data must be corrected for the efficiency of Cerenkov counting (~50 % 

of liquid scintillation counting efficiency) after BsmF I digestion. For experiments 
previously described on 250, 30, 4 7 and 0.5 mg of mouse kidney, the amounts of ds 
10 cDNA remaining on the beads reached 450. 67, 13, and 1 .8 ng, respectively. Compari- 

20 

son of these data with those dealing with Sau3A I-released fragments (2.3) indicates 
that ~6 times lower cDNA amounts are recovered on beads that on Sau3A I-super- 
natants. The average size of Sau3A I-cut fragments is predicted to be 256 bp. The 
25 fraction that remains bound on the beads after Sau3 A I digestion thus suggests that the 

15 average length of cDNA formed is ~-L5 kb, which seems quite reasonable. 

The whole amount of BsmF I-released material is used for ditag 
formation, and we never attempted to quantify it. Nevertheless, the efficiency of 

30 

BsmF I digestion can be checked when > 4 mg of tissue is used for library generation. 
In this case, a Geiger counter allows to detect radioactivity in the BsmF I-supernatant. 
20 4.2. Blunt ending of released cDNA tags 
35 Different enzymes may be used for blunt ending BsmF 1 -re leased 

tags. In their original study, Velculescu at al (7) carried out the blunt ending reaction 
with T4 DNA polymerase. In more recent applications, Klenow DNA polymerase was 
used (8) and is now recommended. It is also our experience that the success in library 

40 

25 generation is very poor using T4 DNA polymerase. This likely comes from the fact 
that blunt ending with T4 DNA polymerase is carried out at 11 °C (12). Such a low 
temperature allows protruding termini from unrelated cDNA tags to hybridize, and is 
45 thus expected to markedly decrease the amount of material available for the blunt 

ending reaction. We have successfully used Vent and sequencing grade T7 DNA 

30 polymerases to generate blunt ends. The procedure described in protocol 5 involves 
T7 DNA polymerase. 

50 
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Protocol 5. Release, blunt ending, and ligation of cDNA tags 
Equipment and reagents 
. BsmF I, 10X NEB IV buffer and 100X BSA. 
5 .PCI. 

. 10 M ammonium acetate . 

. Sequencing grade T7 DNA polymerase (Pharmacia Biotech, ref. 27098503). 
. 5X mix salt (200 mM Tris-HCl (pH 7.5), 100 mM MgCI 2 , 250 mM NaCl). 
. 2 mM dNTP. 
10 . T4 DNA ligase (5 U/ ul) and 5X reaction buffer. 
. 100% ethanol, 75% ethanol. 
. Geiger counter. 

. Water-baths equilibrated at 65°C, 42°C, and 16°C. 
Method 

15 1 . Remove supernatant and immediately add on the beads 1 00 ul of the following mix: 
87 ul H2O, 10 Ml 1 OX NEB IV, 1 ul 100X BSA, 2 ul BsmFI. 

2. Incubate 2 h at 65°C. Vortex intermittently. 

3. Chill 5 min at room temperature, collect the supernatant (which contains the ditags) 

a 

and wash beads twice with 75 ul of ice-cold TE + BSA . Pool all 3 supernatants 
20 (250 ul final volume) and add 60 ug glycogen to each of the 4 reaction tubes. 
Measure the radioactivity still present on the beads by Cerenkov counting (see 
text). 

4. Add 250 ul (1 volume) PCI to all 4 supernatants. 

5. Vortex, then centrifuge (10,000g) 10 min at 4°C. Transfer the upper (aqueous) 
25 phase to a fresh tube. 
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6. Precipitate with high ethanol concentration: add to the aqueous phase 125 ul 10 M 
ammonium acetate, 1.125 mJ 100% ethanol, and centrifuge (15,000g) 20 mm at 
4°C. 

7. Wash the pellet twice with 400 ul of 75% ethanol. Vacuum dry and resuspend the 
5 pellet in 10 ul LoTE. 

8. Add 15 ul of IX mix salt on each tube and heat 2 min at 42°C. Maintain tubes at 
42°C and add 25 jil of the following mix: 7.5 ul H 2 0, 5.5 ul 100 mM DTT, 1 1 ul 
dNTP mix, 1 ul T7 DNA polymerase. Incubate 1 0 min at 42°C. 

9. Pool together tags Ii gated to hybrid(lA/lB) and hybrid(2A/2B). Rinse the tubes 
10 with 150 ul LoTE + 20 ug glycogen and add this solution to the pooled reactions 

(final volume: 250 ul). You have now 2 tubes (1 sample, I negative control). 

10. Extract with equal volume of PCI and high concentration ethanol precipitate (see 
steps 4-6). Resuspend the pellet in 6 ul LoTE. 

11. Ligate tags to form ditags by adding to the 6 ul-sample: 2 ui 5X mix ligase, 1 \x\ 
15 10 mM ATP, and 1 ul of T4 DNA ligase (5 U/ ul). Proceed similarly for the nega- 
tive control, incubate overnight at 16°C, then add 90 u-1 LoTE. 



Final concentration of BSA: 0. 1 mg/ ml. 



20 5 . PCR Amplification 

Considering the linkers and primers used in our studies, the desired PCR product is 
1 10-bp long (90 bp of linkers derived sequences, and 20 bp of ditag). 
5.1. PCR buffers and procedures 

For PCR amplification of ditags, we use buffers and conditions 

25 different from those described by Velculescu et aL: 

(a) amplification is performed with standard PCR buffers without 
DMSO and B-mercaptoethanol. Composition of our I OX PCR buffer is given in proto- 
col 3. Promega buffer (ref. Ml 901) works equally well. The conditions used in our 
assay are as follows: 100 uM dNTP, 2.5 mM MgCl 2 (2 mM with Promega buffer), 
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0.5 uM primers, and 5 U Taq polymerase. High amounts of primers and Taq 
polymerase are used (standard reactions are generally performed with 0.1 (iM primers 
and 1.25 U of enzyme) to ensure a high yield of ditags production. dNTP 
concentration is also slightly higher (100 vs. 50 uM) than for standard PCR 
5 amplifications. Very high dNTP concentrations should nevertheless be avoided since 
these are known to increase Taq polymerase-dependent misincorporations. 

(b) as suggested initially (7), we still perform small scale PCR, 
purify the 1 10-bp fragment, then submit it to preparative PCR. 
5.2. Number of PCR cycles 

10 Before starting protocol 6, the optimal number of PCR cycles needs 

to be determined. This is best accomplished by performing duplicate PCR on 2% of 
the ligation product and sampling 7-uJ aliquots at different cycles. The number of 
cycles will of course depend on the amount of starting material. For 250 mg tissue 
pieces, a PCR signal should be obtained with 18 cycles, and the plateau reached at 22- 

15 23 cycles. The 1 10-bp fragment should be largely predominant (amplified products of 
90 and 100 bp are not unusual). Examples of PCR carried out on ditags generated 
from tiny amounts of cells (15,000 to 45,000) are given in Fig. 3. Using such low 
amounts of cells, the 1 10-bp product is no longer predominant. Nevertheless, if maxi- 
mal yield is achieved with less than 30 cycles (as obtained from 45,000 cells in Fig. 

20 3), a library which is fairly representative of the tissue can be generated. Small scale 
PCR (10 reactions, step 1-5 in protocol 6) is performed on 2 ul and 4 ul aliquots of 
ligation product for macro and microamounts of tissue, respectively. 



Protocol 6. PCR Amplification of Ditags 
25 Equipment and reagents 

. Automated thermal reactor (Hybaid). 

. PCR reagents; Taq polymerase, Primers 1 and 2 at 20 pmol/ ul, 10X PCR buffer (see 

protocol 3), 1.25 mM dNTP, 100 mM MgCl 2 , and 100 mM DTT. 
. 6-Agarase and 1 OX reaction buffer (New England Biolabs, ref. 392L). 
30 . Sau3A I reaction buffer and 100X BSA. 
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. Low melting point (LMP) agarose (Life Technologies, ref. 15517-022). 
. TBE 1 OX (1.12 M Tris, 1.12 M boric acid, 20 mM EDTA) 
10 . Vertical gel electrophoresis unit, with 20X20 cm plates, i .5mm thick spacers, and 

preparative comb. 
5 . 10-bp DNA ladder. 

15 . Bromophenot blue loading buffer (0.125% bromophenol blue, 10% ficoll 400, 12.5 

mM EDTA) filtered on 0.45-um membrane. 
Method 

20 1. Prepare a master mix with the following reagents ; 5 ul 10X PCR Buffer, 1 jul 100 

10 mM MgCl2, 8 ul 1.25 mM dNTP, 2.5 ul primer I, 2.5 ul primer 2, 27 ul H 2 0 
(multiply these quantities by the number of reactions tubes (usually 12)). Dispense 
25 equal aliquots (46 ul) into PCR tubes and add 4 ul of DNA sample (10 tubes), 

negative control, or H2O. 

2. Transfer the tubes in the thermal reactor and heat 2 min at 80°C (hot start condi- 
30 15 tions). 

3. Add in each tube 50 ul of the following mix: 5 ul 10X PCR Buffer, 4 ul 100 mM 
DTT, 40 ul H2O, 1 ul Taq Polymerase. Add a drop of mineral oil if necessary 
according to your thermal cycler. 

4. Perform PCR at the following temperatures : 30 sec at 94°C, 30 sec at 58°C, and 45 
20 sec at 70°C (27-30 cycles), followed by one cycle with an elongation time of 5 min. 

5. Analyse an aliquot of each tube (7 ul) on a 3% agarose gel using 10-bp DNA ladder 
as marker. 

6. If yield is satisfactory (see Fig. 3), pool the 10 PCR tubes in two 1.5 ml tubes. Add 
30 ug glycogen in each tube, extract with PCI. Recover the aqueous phase and pre- 

45 

25 cipitate the DNA by centrifugation after adding 0,1 volume 3 M sodium acetate 
and 2.5 volumes 100% ethanol. Wash the pellet with 75% ethanoL vacuum dry, 
and resuspend in 300 ul LoTE. Add 75 ul of bromophenol blue loading buffer. 
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7. Electrophorese the PCR product through a 3% LMP agarose vertical gel (warm 
plates 15 min at 55°C before pouring the gel). Run until bromophenol blue has 
reached bottom of the gel (~3 h). 

8. Cut out the 1 10-bp fragment from gel and place agarose slice in a 2 ml tube. Add 
5 0.1 volume 10X fl-agarase mix, heat 10 min at 70°C, then 10 min at 40°C, and add 

ft-Agarase (6 U/ 0.2g of agarose). Incubate lh30 at 40°C. Add 30 ug glycogen. 
Extract with PCI and ethanoi precipitate as indicated in section 6. Resuspend the 
pellet in 300 ul LoTE. 

9. After determination of the optimal number of PCR cycles (usually 12), perform 
10 large scale PCR (140-150 reactions) using 2 ul of DNA and the protocol described 

in sections 1-4. 

10. Pool PCR reactions in 2 ml tubes. Extract with PCI, ethanoi precipitate (section 6) 
and wash the pellet twice with 75% ethanoi. Resuspend the dry' pellets in a final 
volume of 470 ul IX mix Sau3A I. 

15 

5.3. Purification and reamplification 

The 1 10-bp PCR product can be purified either on a 12% polyacry- 
lamide (7) or a 3% agarose slab gel. To avoid overloading and achieve efficient purifi- 
cation, pool no more than 10-12 PCR reactions on an agarose gel and slice agarose as 

20 close as possible to the 1 10-bp fragment. Purification and optimal number of PCR 
cycles should then be tested on duplicate 2 ul aliquots of the purified product. A 
single band of 1 10 bp should now be obtained. The absence of interference from other 
amplified products is essential to produce large amounts of the 1 10-bp fragment. 
6. Ditags isolation, concatenation, and cloning of concatemers 

25 6*1. Ditags isolation 

Two important points need to be addressed for ditags purification. 
First, since the total mass of linkers is nearly five times that of ditags, a highly resolu- 
tive polyacryl amide gel is required to thoroughly purify ditags. Second, the short 
length of ditags makes them difficult to detect on gel by ethidium bromide staining. 
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This problem can be overcome by staining the gel with SYBR Green I (or equivalent 
products) which ensures a lower detection threshold than ethidium bromide (0.1 
instead of 2 ng DNA). To obtain high sensitivity, loading buffer should not contain 
10 bromophenol blue (bromophenol blue comigrates with ditags). The gel is stained after 

5 migration in a polypropylene or PVC container. 

Ditags do not run as a single band on polyacrylamide gel. This may 
^ come from subtle effects of base composition on clcctrophoretic mobility and/ or 

some wobble for BsmF I digestion (7). We cut out from gel all the material ranging 
from 22 to 26 bp. The elution procedure is labour intensive but provides ditags that 
1 0 can be concatenated efficiently. 
20 6.2. Concatenation 

Starting from 150 PGR reactions, at least 1 ug ditags should be 
obtained. The optimal ligation time depends on the amount of ditags and on the purity 
25 of the preparation. We usually perform ligation for 2 h. When yield is high (> 1.4 fig), 

15 we set up two ligation reactions, and allow them to proceed for 1 or 2 h. The 
corresponding concatemers are then separately purified on a 8% agarose gel. 

30 

Protocol 7. Ditags isolation and concatenation 

Equipment and Reagents 

35 20 . Sau3 A I, 1 OX reaction buffer and 1 00X BSA. 

. 50X TAE (2 M Tris, 57% glacial acetic acid, 50 mM EDTA) 

. 12% polyacrylamide gel : 53.6 ml H 2 O f 24 ml 40% acrylamide (19:1 
acrylamiderbis), 1.6 ml SOX TAE, 800 ul 10% ammonium persulfate, 69 ul 

40 

TEMED. 
25 . 10-bp DNA ladder. 

. SYBR Green I stain (FMC Byproducts, ref. 50513). 

45 

.T4 DNA Ligase 5 U/ ul and 5X ligation mix; 10 mM ATP. 

. Vertical gel electrophoresis unit, with 20X20 cm plates, 1.5 mm thick spacers, and 
preparative comb. 
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. Xylene cyanole loading buffer (0.125% xylene cyanole, 10% fi6oli 400, 12.5 mM 
EDTA). 

10 , Spin X microcentrifuge tubes (Costar, ref. 8 1 60) 

Method 

5 I. Save 1 uj of the 110-bp DNA fragment (section 10 of protocol 6) and digest the 
15 remaining by adding 5 pi 100X BSA and 25 ul Sau3A I. Incubate overnight at 

37°C in hot-air incubator. 

2. Check for Sau3A I digestion: analyse 1 ul of uncut DNA, I ul and 3 ul of Sau3A I 
20 digestion (use bromophenol blue loading buffer and xylene cyanole loading buffer 

10 for uncut and Sau3A I-digested DNA, respectively) on a 3% agarose gel. Most 
(>80%) of the 1 10-bp fragment has been digested, and a faint band, corresponding 
25 to the ditags can now be detected at -25 bp. 

3. Add 125 fil xylene cyanole loading buffer to the digested DNA sample and load on 
a preparative 12% polyacrylamide vertical gel in IX TAE. Run at 30mA until 

1 5 bromophenol blue of the size marker is 12 cm away from the well. 

4. Transfer the gel in SYBR Green I stain at 1:10,000 dilution in IX TAE. Wrap the 
container in aluminium foil and stain gel for 20 min. Visualise on UV box. 

5. Cut out the ditags band (24-26 bp) and transfer acrylamide slices in 0.5 mi rubes 
(for a 20-cm wide gel use 8 tubes). Pierce the bottom of 0.5 ml tubes with a 18- 

20 gauge needle. Place the tubes in 2 ml tubes and spin 5 min at 1 0,000 g. Prepare the 
following elution buffer for each tube : 475 ul LoTE, 25 uJ 10 M ammonium ace- 
tate, 5 ug glycogen. Add 250 ul elution buffer in each 0.5 ml tube and centrifuge 
again. Discard 0.5 ml tubes and add 250 ul elution buffer directly in each 2 ml 
tube. Incubate overnight at 37°C in hot-air incubator. 
25 6. Prepare a series of 1 6 SpinX microcentrifuge tubes: add 20 fig glycogen in each 
collection tube. Transfer content of each 2 ml tube (-600 pi) to two SpinX 
microcentrifuge tubes. Spin 5 min at 13,000 g. Transfer 350 pi of eluted solution 
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into 1.5 ml tubes (10- 11 tubes), extract with PCI, perform high concentration etha- 
nol precipitation. Wash twice with 75% ethanol, vacuum dry, and pool all pellets in 
15 pi LoTE. 

7. Measure the amount of purified ditags by dot quantitation (12) using 1 ul of sample. 
5 Total DNA at this stage is usually 1 jag, but a library can still be generated with 

400ng. 

8. Ligate ditags to form concatemers: add to your sample (14 pi) 4.4 ul 5X mix ligase, 
2.2 u-1 10 mM ATP, and 2.2 pi concentrated (5 U/ pi) T4 DNA ligase. 

9. Incubate 2h at 16°C. Stop the reaction by adding 5 pi of bromophenol blue loading 
10 buffer and store at -20°C. 

6.3, Purification of concatemers 

Concatemers are heated at 45°C for 5 min immediately before 
loading on gel to separate unligated cohesive ends. Concatemers form a smear on the 

15 gel from about 100 bp to several kbp (Fig. 4) and can be easily detected using SYBR 
Green I stain. All fragments >300 bp (i.e. with 25 or more tags) are potentially inter- 
esting for library construction. We usually cut out fragments of 350-600, 600-2000, 
and >2000 bp and generate a first library using 600-2000 bp DNA fragments. Longer 
fragments will be more informative but are expected to be cloned with poor 

20 efficiency. 



Protocol 8. Purification and cloning of concatemers 
Equipment and reagents. 
. SYBR Green I stain 

25 . Vertical gel electrophoresis unit, with 20X20 cm plates, 1.5mm thick spacers, and 
20-well comb. 
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. 8% poiyacrylamide gel : 61.6 ml H 2 0, 16 ml acryiamide 40% (37.5:1 
acrylamide:bis), 1.6 ml SOX TAE, 800 ul 10% ammonium persulfate, 69 ul 
10 TEMED. 

. 100-bp DNA ladder (Life Technologies, ref. 15628-019) 
5 .T4 DNA ligase 1 U/ ul (Life Technologies, ref. 15224-017) and 5X reaction buffer. 
15 • 10 mM ATP. 

. pBluescript II, linearized with BamH I and dephosphorylated. 
. E. coliXL2 Blue ultracompetent cells (Stratagene, ref. 200150). 
20 Method 

10 I . Heat sample 5 min at 45°C and load into one lane of a 20 wells 8% acryiamide gel. 
Run at 30 mA until bromophenol blue is 10-12 cm from the well. 
25 2. Stain the gel with SYBR Green I as described in protocol 7 and visualise on UV 

box. 

3. Concatemers form a smear on gel with a range from about 100 bp to the gel well 
15 (Fig. 4).Cut out regions containing DNA of 350-600, 600-2000, and >2000 bp. 
Purify separately DNA of each three slices as described in section 5-6 of protocol 7 
(a 1-h incubation period of gel slices in LoTE/ammonium acetate solution is suffi- 
cient). Resuspend the pellet in 6 ui LoTE and generate a first library using 
concatemers of 600-2000 bp. 
20 4. Mix 6 ul of concatemers and 2 ul (25 ng) of BamH I-cut pBluescript H. Heat 5 min 
at 45°C then chill on ice. 

5. Add 3 Jul 5X mix ligase, 1 ul H 2 0, 1.5 ul 10 mM ATP, and 1.5 ul T4 DNA ligase (1 
U/ ul). Mix and incubate overnight at 1 6°C. 

6. Add 20 fig glycogen and 285 ul LoTE and extract with PCI. Ethanol precipitate, 
25 wash twice with 75% ethanol, vacuum dry, and resuspend the pellet in 12 ul LoTE. 

7. Transform E. coli XL2 Blue ultracompetent cells with 1/3 (4 ul) ofiigation reaction 
according to the manufacturer's instructions. Plate different volumes (5 ul, 10 ul, 
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20 ul, 40 ul) of transformation mix onto Petri dishes containing Luria agar supple- 
mented with ampicillin, X-gal, and IPTG. Incubate 15-16hr at 37°C. Save the 
remaining (-900 ul) transformation solution (add 225 ul 80% glycerol, mix inter- 
mittently for 5 min, and store at -80°C). It will be used to plate additional bacteria 
5 if library appears correct. 

8. Count insert-free (i.e. blue) and recombinant (i.e. white) bacterial colonies on each 
plate. The fraction of recombinant colonies should be >50%, and their total number 
should be in the range of 10,000-60,000 for 1 ml of transformation mix. 

10 6.4. Cloning of concatemers 

Concatemers can be cloned and sequenced in a vector of choice. We 
currently clone concatemers in pBluescript II linearized with BamH I and dephospho- 
rylated by calf intestinal alkaline phosphatase treatment. Any kind of vector with a 
BamH I site hi the multiple cloning site will be suitable. Velculescu et al.(7, 8) use 

15 pZero-1 from Invitrogen which only allows recombinants to grow (DNA insertion into 
the multiple cloning site disrupts a lethal gene). The competent cells and transforma- 
tion procedures (heat shock or electroporation) can also be changed according to your 
facilities. Whatever your choice, it is important to use bacterial cells allowing very 
high cloning efficiency (> 5X1 0 9 transformants/ ug of supercoiled DNA). An impor- 

20 tant point is to evaluate the number of clones (protocol 8, step 8) obtained in the 
library. Since a large number (1,000-2,000) of clones will be sequenced, the total 
number of recombinants should be > 10,000. 

Library screening can be performed by PCR or DNA miniprep. In 
our hands, DNA miniprep provides more reproducible amounts of DNA than PCR, 

25 and avoids false positive signals. We use Qiaprep 8 miniprep kit (Qiagen. ref. 27144) 
which enables to perform 96 minipreps in ~2 hours. Plasmid DNA is eluted from 
Qiagen columns with 100 ul of elution buffer; 5 ul are digested to evaluate insert size, 
and if insert is > 200 bp, 5 ul are directly used for DNA sequencing. 
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7. Data analysis 
7.1. Software 

Once cloned concatemers have been sequenced, tags must be 
extracted, quantified, and identified through possible data bank matches. Two 
5 softwares have been written to reach these goals. 

SAGE software (7) was written in Visual Basic and is operating on 
*5 personal computers through the Microsoft Windows system. It extracts tags from text 

sequence files, quantify them, allows to compare several libraries, and provides links 
to GenBank data downloaded from CD-ROM flat files or over the Internet. The latter 
10 function enables rapid identification of tags originating from characterised genes or 
cDNAs. However, description of EST sequences is truncated, which constrains to look 
for individual GenBank reports. SAGE software also includes several simulating tools 
which allows, for example, to assess the significance of differences observed between 
25 two libraries, and to evaluate the sequencing accuracy. 

15 A second software (J. Marti et al., University of Montpellier-2, 

France, CbC, for Cell by Cell) is intended to store and retrieve data from SADE 
experiments. Scripts for extraction of data are developed in C language under Unix 

30 

environment and the database management system implemented in Acces®. Text files 
are concatenated to yield the working file from which tag sequences are extracted and 
20 enumerated. Treatment of raw data involves identification of vector contaminants, 
3 5 truncated and repeated ditags (see below). For experiments on human, mouse and rat 

cell samples, the tags are searched in the non-redundant set of sequences provided by 
the UniGene collection. These data can be loaded from the anonymous FTP site: 
ncbi.nlm.gov/repositoiy/unigene/. Useful files arc Hs.data.Z and Hs.seq.uniq.Z for 
25 man, and similar files for mouse and rat. The results are displayed in a table which 
provides the sequence of each tag, its number of occurrences with the matching cluster 
number (Hs# for homo sapiens, Mm# for Mus mitscuius and Rn# for Rattus 
norvegicus), and other data extracted from the source files, including GenBank acces- 
sion numbers. For human genes, when available, a link is automatically established 
30 with GeneCards, (http://bioinfo.weimiann.ac.il/cards/) allowing to get additional 
information. 
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7.2. Library validity 

7.2.1, Inserts length 

Initial assessment of library quality will be obtained from screening 

10 

for inserts length. A good library should contain >60% clones with inserts > 240 bp 
5 (20 tags). Only one DNA strand is sequenced, since accuracy is obtained from the 
number of tags recorded, rather than from the quality of individual runs. Depending 
15 on your budget and sequencing facilities, either all clones or only the most informative 

ones will be sequenced. It should be noted that the average length of inserts does not 
fit with that of gel-purified concatemers. Although we usually extract 600-2000 bp 
10 long concatemers, most of the clones have inserts <600 bp, and we never get inserts 

20 

>800 bp. A number of reasons can explain such a paradoxical result. Indeed, long 
inserts are known to be cloned with poor efficiency. In addition, they are expected to 
contain several repeats or inverted repeats, and may thus form unstable plasmid 
25 constructs. Supporting this interpretation, it has been demonstrated (14), and we have 

15 also observed, that efficient removal of linkers (which represents up to 20% of total 
tags in poor libraries) increases the average length of cloned inserts. At any rate, it is 
worth to emphasise that similar biological information is obtained from libraries with 

30 

short and long inserts. 

7.2.2. Gene expression pattern 

20 The basic pattern of gene expression in eukaryotic cells have been 

35 established long ago by kinetics analysis of mRNA-cDNA hybridization (15, 16). In a 

"typical" mammalian cell, the total RNA mass consists of 300,000 molecules, 
corresponding to -12,000 transcripts which divide into three abundance classes. A 
very small number of mRNAs (~10) are expressed to exceedingly high levels (3,000- 
40 25 15,000 copies/cell). A larger number of mRNAs (-500) reaches an expression level in 

the range of 100-500 copies/cell. Finally, the majority of mRNAs (> 10,000) are 
poorly expressed (10-100 copies/cell). This basic pattern should be observed in SAGE 
or SADE libraries. However, before translating tags abundance in a definite gene 

45 

expression profile, the data must be scrutinised for artefacts encountered in library 
30 construction. 
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7.2.3. Occurrence of linker-derived sequences 

As mentioned above, some libraries display a high amount of linker 
sequences. If this amount is 20% or more, sequencing will be quite expensive, and it is 
better to start again from the RNA sample. Library contamination with 10-15% of 
5 linker sequences is acceptable, 5-10% is good, and < 5% is excellent. In addition to 
the two perfect linker matches (GTCCCTGTGC (SEQ ID NO:26) and 
GTCCCTTCCG (SEQ ID NO:27)), reading ambiguities can lead to sequences with 
one mismatch. These linker-like sequences are also easily identified since, assuming 
efficient enzymatic cleavage, the probability of having adjacent Sau3 A I and BsmF I 
10 sites in the concatemers is normally zero. Linker and linker-like sequences can be 
automatically discarded using SAGE or CbC software, and their relative amounts can 
be used to evaluate the sequencing accuracy (see 7. 7.). 

7.2.4. Duplicate ditags 

Another category of sequences that must be deleted are those 

15 corresponding to duplicate ditags. Indeed, except for peculiar tissues (e.g. lactating 
gland or laying hen oviduct) in which one or a very small number of transcripts 
constitutes the bulk of the mRNA mass, the probability for any two tags to be found 
several times in the same ditag is very small. Elimination of repeated ditags will 
therefore correct for preferential PCR amplification of some targets, and for picking 

20 several bacterial colonies originating from the same clone. Most ditags (>95%) 
generally occur only once when the library is constructed from macroamounts of 
tissue. For microlibraries, the percent of unique ditags is generally lower. When it is 
no longer compatible (<75%) with efficient data acquisition, it is recommended to 
start again from the first (small scale) PCR (see protocol 6). Duplicate ditags are 

25 automatically retrieved from the sequence files by SAGE and CbC softwares. 
7.3. Number of tags to be sequenced 

The number of tags to be analysed will obviously depend on the 
application and tissue source. As a matter of fact, reducing the tissue complexity 
through isolation of defined cell populations will allow to markedly diminish the 

30 minimum number of tags for accurate analysis, and to better correlate molecular and 
physiological phenotypes. 
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Delineation of the most expressed genes (>500 copies/ cell) in one 
tissue, and comparison with their expression level in another one, will require to 
sequence only a few thousand tags (Fig. 5). Analysing 5,000 tags, 300 will be detected 
at least 3 times. Since automated sequencers can read 48-96 templates simultaneously, 
5 10,000 tags will be recorded from 5-10 gels if the average number of tags/ clone is 
-20. 

15 The most difficult projects are those aiming to compare gene 

expression profiles in the same tissue under two physiological or pathological condi- 
tions. Differentially expressed genes could belong to any of the three abundance 
10 classes and, furthermore, they can be either up- or down-regulated. A reasonable 

20 number of tags to be sequenced would be in the range of 30,000-50,000. The 

probability (P) of detecting a sequence of a given abundance can be calculated from 
the Clarke and Carbon (17) equation (N™ln(I-P)/ln(l-x/n)) 5 where N is the number of 
sequence analysed, x is the expression level, and n is the total number of mRNAs per 

25 

15 cell (-300,000). Thus, the analysis of 30,000 and 50,000 tags will provide a 95% 
confidence level of detecting transcripts expressed at 30 and 18 copies/ cell, respec- 
tively. Most up-regulation processes will be therefore assessed. For example, tags 
30 corresponding to poorly expressed transcripts may be detected 1 and > 5 times in 

control and experimental conditions, respectively. However, we have to be aware that 
20 the possibility of assessing down-regulation processes will be less exhaustive. It will 
only concern tags present > 5-10 times in the control condition, which excludes from 
the analysis part of the poorly expressed transcripts. 

In the here above Table I, which corresponds to the characterisation 
of the most abundant nuclear transcripts in the mouse outer medullary collecting duct 
40 25 (OMCD) and establishes their differential expression in the medullary thick ascending 

limb (MTAL), the two left columns correspond to the data illustrated in Figure 5, and 
provide the abundance of each tag in the two libraries. The third column provides the 
sequence of the tags. The right column indicates results of individual BLAST search 
in GenBank, carried out using a 14-bp sequence (the Sau3A 1 recognition sequence, 
30 plus the 10 bp specific for each tag). 
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As evident from the above, the invention is not at all limited to its 
5 embodiment, implementation and application which have just been defined more 
explicitly; it embraces, on the contrary, all the variants which may occur to a specialist 
15 in this field, without departing from the framework or the scope of the present 

invention. 
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CLAIMS 

1°) Method of obtaining a library of tags able to define a specific 
state of a biological sample, characterised in that it comprises the following successive 
steps: 

5 (1) extracting in a single-step mRNA from a small amount of a 

biological sample using oligo(dT) 2 5 covalently bound to paramagnetic beads, 

(2) generating a double strand cDNA library, from said mRNA 
according to the following steps: 

* synthesising the 1 SI strand of said cDN A by reverse transcription of 
10 said mRNA template into a 1 st complementary single-strand cDNA, using a reverse 

20 transcriptase lacking Rnase H activity, 

* synthesising the 2 nd strand of said cDNA by nick translation of the 
mRNA, in the mRNA-cDNA hybrid formed by an E. colt DNA polymerase, 

(3) cleaving the obtained cDNAs using the restnction endonuclease 

25 

15 Sau3A I as anchoring enzyme, 

(4) separating the cleaved cDNAs in two aliquots, 

(5) li gating the cDNA contained in each of said two aliquots via said 
30 Sau3A I restriction site to a linker consisting of one double-strand cDNA molecule 

having one of the following formulas: 
20 GATCGTCCC-X, or GATCGTCCC-X 2 , 

wherein Xj and X2, which comprise 30-37 nucleotides and are 
different, include a 20-25 bp PCR priming site with a Tm of 55°C-65°C, and 

wherein GATCGTCCC (SEQ ID NO:l) correspond to a Sau3A I 
restriction site joined to a BsmF I restriction site, 
40 25 (6) digesting the products obtained in step (5) with the tagging 

enzyme BsmF I and releasing linkers with anchored short piece of cDNA 
corresponding to a transcript-specific tag, said digestion generating BsmF I tags 
specific of the initial mRNA, 

45 

(7) blunt-ending said BsmF I tags with a DNA polymerase, prefera- 
30 bly T7 DNA polymerase or Vent polymerase and mixing the tags Ugated with the 
different linkers, 
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(8) ligating the tags obtained in step (7) to form ditags with a DNA 

ligase, 

(9) amplifying the ditags obtained in step (8) with primers 
comprising 20-25 bp and having a Tm of 55°-65°C, 

5 (10) isolating the ditags having between 20 and 28 bp from the 

amplification products obtained in step (9) by digesting said amplification products 
with the anchoring enzyme Sau3A I and separating the digested products on an 
appropriate gel electrophoresis, 

(11) ligating the ditags obtained in step (10) to form concatemers, 
1 0 purifiying said concatemers and separating the concatemers having more than 300 bp, 

(12) cloning and sequencing said concatemers and 

(13) analysing the different obtained tags. 

2°) Method according to claim 1, characterised in that in step (2), 
said synthesis of the 1 st strand of said cDNA is performed with Moloney Murine 
1 5 Leukaemia Virus reverse transcriptase (M-MLV RT), and oiigo(dTh 5 as primers. 

3°) Method according to claim 1 or to claim 2, characterised in that 
the linkers of step (5) are preferably hybrid DNA molecules formed from linkers 1 A 
and IB or from linkers 2 A and 2B, having the following formulas: 
linker 1A: 5*-TTTTGCCAGGTCACTCAAGTCGGTCATTCATGTCAGCACAGGGAC-3' 
20 (SEQIDNO:2) 

linker IB: 5*-GATCGTCCCTGTGCTGACATGAATGACCGACTTGAGTGACCTGGCA- 
3'(SEQ IDNO:3),or 

linker 2A: S'-TTTTTGCTCAGGCTCAAGGCTCGTCTAATCACAGTCGGAAGGGAC-S* 
(SEQ IDNO:4) 

25 linker 2B: S'-GATCGTCCCTTCCGACTGTGATTAGACGAGCCTTGAGCCTGAGCAA- 
3' (SEQ ID NO:5). 

4°) Method according to claims I to 3, characterised in that the 
amount of each linker in step (5) is at most of 8-10 pmol and preferably comprised 
between 0.5 pmol and 8 pmol for initial amounts of respectively 10-40 ng of mRNAs 
30 and 5 ug of mRNAs. 

5 Q ) Method according to claims 1 to 4. characterised in that the 
primers of step (9) have preferably the following formulas: 
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40 

5*-GCCAGGTCACTCAAGTCGGTCATT-3' (SEQ ID NO:6) 
5 '-TGCTCAGGCTCAAGGCTCGTCT A-3 * (SEQ ID NO:7). 
6°) Method according to claims 1 to 5, characterised in that the 
biological sample of step (1) preferably, comprises < 5. 1 0 6 cells, corresponding to at 
5 most 50 ng of total RNA or 1 jig of poly(A) RNA. 

7°) Method according to claims I to 6, characterised in that said 
15 tissue sample is from kidney, more specifically from nephron segments corresponding 

to about 15,000 to 45,000 cells, corresponding to 0.15-0.45 ,ug of total RNA. 

8°) Use of a library of tags obtained according to the method of 
10 claims 1 to 7, for assessing the state of a biological sample. 

20 

9°) Use of the tags obtained according to claims 1 to 7 as probes. 
10°) Method of determination of a gene expression profile, charac- 
terised in that it comprises : 
25 - performing steps (1) to (13) according to claim 1 and 

1 5 . translating cDNA tag abundance in gene expression profile. 

11°) Method according to claim 10, characterised in that the gene 
expression profile obtained in mouse outer medullary collecting duct (OMCD) and in 
mouse medullary thick ascending limb (MTAL) is as specified in Table I. 

12°) A kit useful for detection of gene expression profile, character- 
20 ised in that the presence of a cDNA tag obtained from the mRNA extracted from a 
biological sample, is indicative of expression of a gene having said tag sequence at an 
appropriate position, i.e. immediately adjacent to the most 3' Sau3A I site in said 
cDNA, the kit comprising further to usual buffers for cDNA synthesis, restriction 
enzyme digestion, ligation and amplification, 
40 25 - containers containing linker consisting of one double-strand cDNA 

molecule having one of the following formulas: 

GATCGTCCC-Xj or GATCGTCCC-X 2 , 

wherein Xj and X 2 , which comprise 30-37 nucleotides and are 

45 

different, include a 20-25 bp PCR priming site with a Tm of 55°C-65 Q C, and 
30 wherein GATCGTCCC (SEQ ID NO:l) correspond to a Sau3A I 

restriction site joined to a BsmF I restriction site, and 
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- containers containing primers comprising 20-25 bp and having a 



Tmof 55°-65°C 



10 



5 
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linker 1A: S'-TTTTCCCAGGTCACTCAAGTCGGTCATTCATGTCAGCACAGGGAC-3 , 



(SEQ ID NO:2) 

linker IB: 5 ? -GATCGTCCCTGTGCTGACATGAATGACCGACTTGAGTGACCTGGCA- 
10 3'(SEQIDNO:3) 
or 

linker 2A: 5 *-TTTTTGCTC AGGCTC AAGGCTCGTCTAATC ACAGTCGGA AGGG AC-3 ' 
(SEQ ID NO:4) 

linker 2B: S^GATCGTCCCTTCCGACTGTGATTAOACGAGCCTTOAGCCTGAGCAA- 
15 3* (SEQ ID NO:5), and 



30 



- containers containing the following primers: 

5 '-GCCAGGTCACTCAAGTCGGTCATT-3' (SEQ ID NO:6) 

5 ' -TGCTC AGGCTC AAGGCTCGTCTA-3 ' (SEQ ID NO:7). 
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SEQUENCE LISTING 

<110> COMMISSARAIT A L'ENERGIE ATOMIQUE 

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE 

<120> MICROASSAY TOR SERIAL ANALYSIS Or GENE EXPRESSION AND 
APPLICATIONS THEREOF. 

<130> BLOcp263EP51 

<140> 
<141> 

<160> 27 

<170> Patentln Ver. 2.1 

<210> 1 
<211> 9 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: LINKER 
<400> 1 

gatcgtccc 9 



<210> 2 
<211> 45 
<2I2> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: LINKER 1A 
<400> 2 

ttttgccagg tcactcaagt cggtcantca tgtcagcaca gggac 45 



<210> 3 
<211> 46 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : LINKER IB 
<400> 3 

gatcgtccct gtgctgacat gaatgaccga cttgagtgac ctggca 4 6 



<210> 4 
<211> 45 
<212> DNA 

<213> Artificial Sequence 



WO 00/44936 



PCT/IBOO/00111 



2 



<220> 

<223> Description of Artificial Sequence : LINKER2A 



<400> 4 

tttttgctca ggctcaaggc tcgtctaatic acagtcggaa gggac 



45 



<2I0> 5 
<211> 46 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : LINKER 2B 
<400> 5 

gatcgtccct tccgactgtg attagacgag ccttgagcct gagcaa 4 6 



<210> 6 
<211> 24 
<212> DNA 

<2i3> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : PRIMER 



<2X0> 7 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : PRIMER 
<4Q0> 7 

tgctcaggct caaggctcgt eta 23 



<210> 8 
<211> 10 
<212> DNA 
<213> Mus sp. 



<400> 6 

gccaggtcac teaagteggt catt 



24 



<400> 8 
gtggcagtgg 



10 



<210> 9 
<211> 10 
<212> DNA 
<213> Mus sp. 
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<400> 9 

ttataatttg 10 



<2L0> 10 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 10 

-ggcagtggg 10 



<210> 11 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 11 
tgactccctc 



<210> 12 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 12 
aagtttaaa- 



<210> 13 
<211> 10 
<212> DNA 
<213> Mus sp. 

<4O0> 13 
agcaagcagg 



<210> 14 
<211> 10 
<212> DNA 
<213> Mus sp, 

<400> 14 
caaaaagcta 



<210> 15 
<211> 10 
<212> DNA 
<213> Mus sp. 



<400> 15 
acattcctta 
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<210> 16 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 16 
sccgaccgca 
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<210> 17 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 17 

cagaagaagt 10 

<210> 18 
<211> 10 
<212> DNA 
<213> Mus sp. 



<400> 18 
aaataaagtt 
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<210> 19 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 19 

agaagcagtg ^ 

<T210> 20 
<211> 10 
<212> ONA 
<213> Mus sp. 

<400> 20 

tgatgccctc ^° 

<210> 21 
<211> 10 
<212> DNA 
<213> Mus sp. 

<400> 21 

aggctactac ^ 

<210> 22 
<2il> 10 
<212> DNA 
<213> MUS sp. 
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<400> 22 
gctcattgga 
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<210> 23 
<2X1> 10 
<212> DNA 
<213> Mus sp. 

<400> 23 

gctttcagca 10 



<210> 24 
<211> 10 
<212> DNA 
<213> Mus sp. 



<210> 25 
<211> 10 
<212> DMA 
<213> Mus Sp. 

<400> 25 

tgaccaaggc 10 

<210> 26 
<2ll> 10 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: linker 



<210> 27 
<211> 10 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : linker 



<400> 24 
gtgactgggt 



10 



<400> 26 
gtccctgtgc 



10 



<400> 27 
gtcccttccg 
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